-
Notifications
You must be signed in to change notification settings - Fork 12
build(deps): bump pip from 23.2.1 to 23.3 in /drivers/gpu/drm/ci/xfails #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dependabot
wants to merge
21
commits into
combined_queue
Choose a base branch
from
dependabot/pip/drivers/gpu/drm/ci/xfails/pip-23.3
base: combined_queue
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
build(deps): bump pip from 23.2.1 to 23.3 in /drivers/gpu/drm/ci/xfails #1
dependabot
wants to merge
21
commits into
combined_queue
from
dependabot/pip/drivers/gpu/drm/ci/xfails/pip-23.3
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In the diagram of the topology, $swp3 and $swp4 are described as 1Gbps ports. This is wrong information, the test does not configure such speed. Fixes: bfa8047 ("selftests: mlxsw: Add a PFC test") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic using a shaper somewhere in the flow of the packets. In this area, the buffer is smaller than the buffer at the beginning of the flow, so it fills up until there is no more space left. The test configures there PFC which is supposed to notice that the headroom is filling up and send PFC Xoff to indicate the transmitter to stop sending traffic for the priorities sharing this PG. The Xon/Xoff threshold is auto-configured and always equal to 2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet, traffic will keep arriving until the transmitter receives and processes the PFC packet. This amount of traffic is known as the PFC delay allowance. Currently the buffer for the delay traffic is configured as 100KB. The MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB. This allows 80KB extra to be stored in this buffer. 8-lane ports use two buffers among which the configured buffer is split, the Xoff threshold then applies to each buffer in parallel. The test does not take into account the behavior of 8-lane ports, when the ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes, packets are dropped and the test fails. Check if the relevant ports use 8 lanes, in such case double the size of the buffer, as the headroom is split half-half. Fixes: bfa8047 ("selftests: mlxsw: Add a PFC test") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
The flags are not used outside of the C file so move them there. Suggested-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Device drivers register with devlink from their probe routines (under
the device lock) by acquiring the devlink instance lock and calling
devl_register().
Drivers that support a devlink reload usually implement the
reload_{down, up}() operations in a similar fashion to their remove and
probe routines, respectively.
However, while the remove and probe routines are invoked with the device
lock held, the reload operations are only invoked with the devlink
instance lock held. It is therefore impossible for drivers to acquire
the device lock from their reload operations, as this would result in
lock inversion.
The motivating use case for invoking the reload operations with the
device lock held is in mlxsw which needs to trigger a PCI reset as part
of the reload. The driver cannot call pci_reset_function() as this
function acquires the device lock. Instead, it needs to call
__pci_reset_function_locked which expects the device lock to be held.
To that end, adjust devlink to always acquire the device lock before the
devlink instance lock when performing a reload.
For now, only do that when reload is triggered as part of netns
dismantle. Subsequent patches will handle the case where reload is
explicitly triggered by user space.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Currently, private flags (e.g., 'DEVLINK_NL_FLAG_NEED_PORT') are only used in pre_doit operations, but a subsequent patch will need to conditionally lock and unlock the device lock in pre and post doit operations, respectively. As a preparation, enable the use of private flags in post_doit operations in a similar fashion to how it is done for pre_doit operations. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Introduce a new private flag ('DEVLINK_NL_FLAG_NEED_DEV_LOCK') to allow
netlink commands to specify that they need to acquire the device lock in
their pre_doit operation and release it in their post_doit operation.
The reload command will use this flag in the subsequent patch.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Device drivers register with devlink from their probe routines (under
the device lock) by acquiring the devlink instance lock and calling
devl_register().
Drivers that support a devlink reload usually implement the
reload_{down, up}() operations in a similar fashion to their remove and
probe routines, respectively.
However, while the remove and probe routines are invoked with the device
lock held, the reload operations are only invoked with the devlink
instance lock held. It is therefore impossible for drivers to acquire
the device lock from their reload operations, as this would result in
lock inversion.
The motivating use case for invoking the reload operations with the
device lock held is in mlxsw which needs to trigger a PCI reset as part
of the reload. The driver cannot call pci_reset_function() as this
function acquires the device lock. Instead, it needs to call
__pci_reset_function_locked which expects the device lock to be held.
To that end, adjust devlink to always acquire the device lock before the
devlink instance lock when performing a reload.
Do that when reload is explicitly triggered by user space by specifying
the 'DEVLINK_NL_FLAG_NEED_DEV_LOCK' flag in the pre_doit and post_doit
operations of the reload command.
A previous patch already handled the case where reload is invoked as
part of netns dismantle.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Add an assert to verify that the device lock is always held throughout reload operations. Tested the following flows with netdevsim and mlxsw while lockdep is enabled: netdevsim: # echo "10 1" > /sys/bus/netdevsim/new_device # devlink dev reload netdevsim/netdevsim10 # ip netns add bla # devlink dev reload netdevsim/netdevsim10 netns bla # ip netns del bla # echo 10 > /sys/bus/netdevsim/del_device mlxsw: # devlink dev reload pci/0000:01:00.0 # ip netns add bla # devlink dev reload pci/0000:01:00.0 netns bla # ip netns del bla # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove # echo 1 > /sys/bus/pci/rescan Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a
reset (i.e., they advertise NoSoftRst-). However, this transition does
not have any effect on the device: It continues to be operational and
network ports remain up. Advertising this support makes it seem as if a
PM reset is viable for these devices. Mark it as unavailable to skip it
when testing reset methods.
Before:
# cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
pm bus
After:
# cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method
bus
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Currently, the time it took a PCI device to become ready after reset is
only printed if it was longer than 1000ms ('PCI_RESET_WAIT'). However,
for debugging purposes it is useful to know this time even if it was
shorter. For example, with the device I am working on, hardware
engineers asked to verify that it becomes ready on the first try (no
delay).
To that end, add a debug level print that can be enabled using dynamic
debug. Example:
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
# dmesg -c | grep ready
# echo "file drivers/pci/pci.c +p" > /sys/kernel/debug/dynamic_debug/control
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
# dmesg -c | grep ready
[ 396.060335] mlxsw_spectrum4 0000:01:00.0: ready 0ms after bus reset
# echo "file drivers/pci/pci.c -p" > /sys/kernel/debug/dynamic_debug/control
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
# dmesg -c | grep ready
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Currently mlxsw_reg_mrsr_pack() always sets 'command=1'. As preparation for support of new reset flow, pass the command as an argument to the function and add an enum for this field. For now, always pass 'command=1' to the pack() function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
In the next patches, mlxsw_pci_sw_reset() will be extended to support more reset types and will not necessarily issue a software reset. Rename the function to reflect that. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
In general, the existing flow of software reset in the driver is: 1. Wait for system ready status. 2. Send MRSR command, to start the reset. 3. Wait for system ready status. This flow will be extended once a new reset command is supported. As a preparation, move step #2 to a separate function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
The driver resets the device during probe and during a devlink reload. The current reset method reloads the current firmware version or a pending one, if one was previously flashed using devlink. However, the current reset method does not result in a PCI hot reset, preventing the PCI firmware from being upgraded, unless the system is rebooted. To solve this problem, a new reset command (6) was implemented in the firmware. Unlike the current command (1), after issuing the new command the device will not start the reset immediately, but only after a PCI hot reset. Implement the new reset method by first verifying that it is supported by the current firmware version by querying the Management Capabilities Mask (MCAM) register. If supported, issue the new reset command (6) via MRSR register followed by a PCI reset by calling __pci_reset_function_locked(). Once the PCI firmware is operational, go back to the regular reset flow and wait for the entire device to become ready. That is, repeatedly read the "system_status" register from the BAR until a value of "FW_READY" (0x5E) appears. Tested: # for i in $(seq 1 10); do devlink dev reload pci/0000:01:00.0; done Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com>
Implement reset_prepare() and reset_done() handlers that are invoked by
the PCI core before and after issuing a PCI reset, respectively.
Specifically, implement reset_prepare() by calling
mlxsw_core_bus_device_unregister() and reset_done() by calling
mlxsw_core_bus_device_register(). This is the same implementation as the
reload_{down,up}() devlink operations with the following differences:
1. The devlink instance is unregistered and then registered again after
the reset.
2. A reset via the device's command interface (using MRSR register) is
not issued during reset_done() as PCI core already issued a PCI
reset.
Tested:
# for i in $(seq 1 10); do echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset; done
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Test that PCI reset works correctly by verifying that only the expected reset methods are supported and that after issuing the reset the ifindex of the port changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com>
According to Documentation/networking/operstates.rst, the administrative
and operational state of an interface can be determined via both the
'ifi_flags' field in the ancillary header of a RTM_NEWLINK message and
the 'IFLA_OPERSTATE' attribute in the message.
When a driver signals loss of carrier via netif_carrier_off(), the
change is reflected immediately in the 'ifi_flags' field, but not in the
'IFLA_OPERSTATE' attribute. This is because changes in 'IFLA_OPERSTATE'
are performed in a link watch delayed work. From the document:
"Whenever the driver CHANGES one of these flags, a workqueue event is
scheduled to translate the flag combination to IFLA_OPERSTATE as
follows"
This means that it is possible for user space that is constantly
querying the kernel for interface state to see the output below when the
following commands are run. Their purpose is to simulate tearing down of
a selftest and start of a new one.
Commands:
# ip link set dev veth1 down
# ip link set dev veth0 down
# ip link set dev veth0 up
# ip link set dev veth1 up
Output:
11: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP mode DEFAULT group default qlen 1000
link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
11: veth0@veth1: <BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue
state UP mode DEFAULT group default qlen 1000
link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
11: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noqueue
state DOWN mode DEFAULT group default qlen 1000
link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
11: veth0@veth1: <BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue
state UP mode DEFAULT group default qlen 1000
link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
11: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP mode DEFAULT group default qlen 1000
link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
In the third line, the operational state is set by rtnl_fill_ifinfo() to
'IF_OPER_DOWN' because the interface is administratively down, but
'dev->operstate' is still 'IF_OPER_UP' as the delayed work has yet to
run. That is why the interface's operational state is reported as
'IF_OPER_UP' in the next line after it was put administratively up
again.
The output in the fourth line would make user space believe that the
interface is fully operational if only the 'IFLA_OPERSTATE' attribute is
considered. This is false as the interface does not have a carrier
('LOWER_UP' is not set) at this stage.
Solve this by determining if an interface is operational solely based on
the presence of the 'IFF_UP' and 'IFF_LOWER_UP' bits in the 'ifi_flags'
field.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Extend MFGD (Monitoring FW General Debug Register) with a new field, en_debug_assert that enables INTR severity for fatal events sent by MFDE. In addition, add a fw_fatal_event_mode for enabling MFDE trap and stopping FW. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
INTERNAL severity is unexpected state without systemic failure. Enable INTERNAL asserts in MFGD register to be used for better coverage. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
MFGD.fw_fatal_event_mode has a new mode for enabling MFDE trap and stopping FW on assert. Use that event mode in testing mode in order to create a more accurate dump. Suggested-by: Miryam Levy <miryaml@nvidia.com> > I requested it to be 2 and I don’t see a reason why it should be changed, for > tests environment it is always 2, also for SDK verification. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Bumps [pip](https://github.com/pypa/pip) from 23.2.1 to 23.3. - [Changelog](https://github.com/pypa/pip/blob/main/NEWS.rst) - [Commits](pypa/pip@23.2.1...23.3) --- updated-dependencies: - dependency-name: pip dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
63e7849 to
f6fee35
Compare
f6fee35 to
598db5e
Compare
pmachata
pushed a commit
that referenced
this pull request
Dec 1, 2025
MT7996 driver can use both wed and wed_hif2 devices to offload traffic from/to the wireless NIC. In the current codebase we assume to always use the primary wed device in wed callbacks resulting in the following crash if the hw runs wed_hif2 (e.g. 6GHz link). [ 297.455876] Unable to handle kernel read from unreadable memory at virtual address 000000000000080a [ 297.464928] Mem abort info: [ 297.467722] ESR = 0x0000000096000005 [ 297.471461] EC = 0x25: DABT (current EL), IL = 32 bits [ 297.476766] SET = 0, FnV = 0 [ 297.479809] EA = 0, S1PTW = 0 [ 297.482940] FSC = 0x05: level 1 translation fault [ 297.487809] Data abort info: [ 297.490679] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 297.496156] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 297.501196] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 297.506500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000107480000 [ 297.512927] [000000000000080a] pgd=08000001097fb003, p4d=08000001097fb003, pud=08000001097fb003, pmd=0000000000000000 [ 297.523532] Internal error: Oops: 0000000096000005 [#1] SMP [ 297.715393] CPU: 2 UID: 0 PID: 45 Comm: kworker/u16:2 Tainted: G O 6.12.50 #0 [ 297.723908] Tainted: [O]=OOT_MODULE [ 297.727384] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT) [ 297.732857] Workqueue: nf_ft_offload_del nf_flow_rule_route_ipv6 [nf_flow_table] [ 297.740254] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 297.747205] pc : mt76_wed_offload_disable+0x64/0xa0 [mt76] [ 297.752688] lr : mtk_wed_flow_remove+0x58/0x80 [ 297.757126] sp : ffffffc080fe3ae0 [ 297.760430] x29: ffffffc080fe3ae0 x28: ffffffc080fe3be0 x27: 00000000deadbef7 [ 297.767557] x26: ffffff80c5ebca00 x25: 0000000000000001 x24: ffffff80c85f4c00 [ 297.774683] x23: ffffff80c1875b78 x22: ffffffc080d42cd0 x21: ffffffc080660018 [ 297.781809] x20: ffffff80c6a076d0 x19: ffffff80c6a043c8 x18: 0000000000000000 [ 297.788935] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000 [ 297.796060] x14: 0000000000000019 x13: ffffff80c0ad8ec0 x12: 00000000fa83b2da [ 297.803185] x11: ffffff80c02700c0 x10: ffffff80c0ad8ec0 x9 : ffffff81fef96200 [ 297.810311] x8 : ffffff80c02700c0 x7 : ffffff80c02700d0 x6 : 0000000000000002 [ 297.817435] x5 : 0000000000000400 x4 : 0000000000000000 x3 : 0000000000000000 [ 297.824561] x2 : 0000000000000001 x1 : 0000000000000800 x0 : ffffff80c6a063c8 [ 297.831686] Call trace: [ 297.834123] mt76_wed_offload_disable+0x64/0xa0 [mt76] [ 297.839254] mtk_wed_flow_remove+0x58/0x80 [ 297.843342] mtk_flow_offload_cmd+0x434/0x574 [ 297.847689] mtk_wed_setup_tc_block_cb+0x30/0x40 [ 297.852295] nf_flow_offload_ipv6_hook+0x7f4/0x964 [nf_flow_table] [ 297.858466] nf_flow_rule_route_ipv6+0x438/0x4a4 [nf_flow_table] [ 297.864463] process_one_work+0x174/0x300 [ 297.868465] worker_thread+0x278/0x430 [ 297.872204] kthread+0xd8/0xdc [ 297.875251] ret_from_fork+0x10/0x20 [ 297.878820] Code: 928b5ae0 8b000273 91400a60 f943fa61 (79401421) [ 297.884901] ---[ end trace 0000000000000000 ]--- Fix the issue detecting the proper wed reference to use running wed callabacks. Fixes: 83eafc9 ("wifi: mt76: mt7996: add wed tx support") Tested-by: Daniel Pawlik <pawlik.dan@gmail.com> Tested-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20251008-wed-fixes-v1-1-8f7678583385@kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name>
pmachata
pushed a commit
that referenced
this pull request
Dec 1, 2025
…stats Cited commit added a dedicated mutex (instead of RTNL) to protect the multicast route list, so that it will not change while the driver periodically traverses it in order to update the kernel about multicast route stats that were queried from the device. One instance of list entry deletion (during route replace) was missed and it can result in a use-after-free [1]. Fix by acquiring the mutex before deleting the entry from the list and releasing it afterwards. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] Read of size 8 at addr ffff8881523c2fa8 by task kworker/2:5/22043 CPU: 2 UID: 0 PID: 22043 Comm: kworker/2:5 Not tainted 6.18.0-rc1-custom-g1a3d6d7cd014 #1 PREEMPT(full) Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017 Workqueue: mlxsw_core mlxsw_sp_mr_stats_update [mlxsw_spectrum] Call Trace: <TASK> dump_stack_lvl+0xba/0x110 print_report+0x174/0x4f5 kasan_report+0xdf/0x110 mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 mlxsw_sp_mr_route_add+0xd8/0x4770 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Freed by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x43/0x70 kfree+0x14e/0x700 mlxsw_sp_mr_route_add+0x2dea/0x4770 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:444 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Fixes: f38656d ("mlxsw: spectrum_mr: Protect multicast route list with a lock") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com>
fbc947f to
ad75ef2
Compare
pmachata
pushed a commit
that referenced
this pull request
Dec 2, 2025
Testing in two circumstances: 1. back to back optical SFP+ connection between two LS1028A-QDS ports with the SCH-26908 riser card 2. T1042 with on-board AQR115 PHY using "OCSGMII", as per https://lore.kernel.org/lkml/aIuEvaSCIQdJWcZx@FUE-ALEWI-WINX/ strongly suggests that enabling in-band auto-negotiation is actually possible when the lane baud rate is 3.125 Gbps. It was previously thought that this would not be the case, because it was only tested on 2500base-x links with on-board Aquantia PHYs, where it was noticed that MII_LPA is always reported as zero, and it was thought that this is because of the PCS. Test case #1 above shows it is not, and the configured MII_ADVERTISE on system A ends up in the MII_LPA on system B, when in 2500base-x mode (IF_MODE=0). Test case #2, which uses "SGMII" auto-negotiation (IF_MODE=3) for the 3.125 Gbps lane, is actually a misconfiguration, but it is what led to the discovery. There is actually an old bug in the Lynx PCS driver - it expects all register values to contain their default out-of-reset values, as if the PCS were initialized by the Reset Configuration Word (RCW) settings. There are 2 cases in which this is problematic: - if the bootloader (or previous kexec-enabled Linux) wrote a different IF_MODE value - if dynamically changing the SerDes protocol from 1000base-x to 2500base-x, e.g. by replacing the optical SFP module. Specifically in test case #2, an accidental alignment between the bootloader configuring the PCS to expect SGMII in-band code words, and the AQR115 PHY actually transmitting SGMII in-band code words when operating in the "OCSGMII" system interface protocol, led to the PCS transmitting replicated symbols at 3.125 Gbps baud rate. This could only have happened if the PCS saw and reacted to the SGMII code words in the first place. Since test #2 is invalid from a protocol perspective (there seems to be no standard way of negotiating the data rate of 2500 Mbps with SGMII, and the lower data rates should remain 10/100/1000), in-band auto-negotiation for 2500base-x effectively means Clause 37 (i.e. IF_MODE=0). Make 2500base-x be treated like 1000base-x in this regard, by removing all prior limitations and calling lynx_pcs_config_giga(). This adds a new feature: LINK_INBAND_ENABLE and at the same time fixes the Lynx PCS's long standing problem that the registers (specifically IF_MODE, but others could be misconfigured as well) are not written by the driver to the known valid values for 2500base-x. Co-developed-by: Alexander Wilhelm <alexander.wilhelm@westermo.com> Signed-off-by: Alexander Wilhelm <alexander.wilhelm@westermo.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251125103507.749654-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pmachata
pushed a commit
that referenced
this pull request
Dec 2, 2025
…stats Cited commit added a dedicated mutex (instead of RTNL) to protect the multicast route list, so that it will not change while the driver periodically traverses it in order to update the kernel about multicast route stats that were queried from the device. One instance of list entry deletion (during route replace) was missed and it can result in a use-after-free [1]. Fix by acquiring the mutex before deleting the entry from the list and releasing it afterwards. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] Read of size 8 at addr ffff8881523c2fa8 by task kworker/2:5/22043 CPU: 2 UID: 0 PID: 22043 Comm: kworker/2:5 Not tainted 6.18.0-rc1-custom-g1a3d6d7cd014 #1 PREEMPT(full) Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017 Workqueue: mlxsw_core mlxsw_sp_mr_stats_update [mlxsw_spectrum] Call Trace: <TASK> dump_stack_lvl+0xba/0x110 print_report+0x174/0x4f5 kasan_report+0xdf/0x110 mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 mlxsw_sp_mr_route_add+0xd8/0x4770 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Freed by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x43/0x70 kfree+0x14e/0x700 mlxsw_sp_mr_route_add+0x2dea/0x4770 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:444 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Fixes: f38656d ("mlxsw: spectrum_mr: Protect multicast route list with a lock") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com>
ad75ef2 to
9705b04
Compare
pmachata
pushed a commit
that referenced
this pull request
Dec 3, 2025
Neither sock4 nor sock6 pointers are guaranteed to be non-NULL in vxlan_xmit_one, e.g. if the iface is brought down. This can lead to the following NULL dereference: BUG: kernel NULL pointer dereference, address: 0000000000000010 Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:vxlan_xmit_one+0xbb3/0x1580 Call Trace: vxlan_xmit+0x429/0x610 dev_hard_start_xmit+0x55/0xa0 __dev_queue_xmit+0x6d0/0x7f0 ip_finish_output2+0x24b/0x590 ip_output+0x63/0x110 Mentioned commits changed the code path in vxlan_xmit_one and as a side effect the sock4/6 pointer validity checks in vxlan(6)_get_route were lost. Fix this by adding back checks. Since both commits being fixed were released in the same version (v6.7) and are strongly related, bundle the fixes in a single commit. Reported-by: Liang Li <liali@redhat.com> Fixes: 6f19b2c ("vxlan: use generic function for tunnel IPv4 route lookup") Fixes: 2aceb89 ("vxlan: use generic function for tunnel IPv6 route lookup") Cc: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251126102627.74223-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pmachata
pushed a commit
that referenced
this pull request
Dec 3, 2025
…stats Cited commit added a dedicated mutex (instead of RTNL) to protect the multicast route list, so that it will not change while the driver periodically traverses it in order to update the kernel about multicast route stats that were queried from the device. One instance of list entry deletion (during route replace) was missed and it can result in a use-after-free [1]. Fix by acquiring the mutex before deleting the entry from the list and releasing it afterwards. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] Read of size 8 at addr ffff8881523c2fa8 by task kworker/2:5/22043 CPU: 2 UID: 0 PID: 22043 Comm: kworker/2:5 Not tainted 6.18.0-rc1-custom-g1a3d6d7cd014 #1 PREEMPT(full) Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017 Workqueue: mlxsw_core mlxsw_sp_mr_stats_update [mlxsw_spectrum] Call Trace: <TASK> dump_stack_lvl+0xba/0x110 print_report+0x174/0x4f5 kasan_report+0xdf/0x110 mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 mlxsw_sp_mr_route_add+0xd8/0x4770 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Freed by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x43/0x70 kfree+0x14e/0x700 mlxsw_sp_mr_route_add+0x2dea/0x4770 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:444 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Fixes: f38656d ("mlxsw: spectrum_mr: Protect multicast route list with a lock") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com>
b8414f4 to
81cdcfa
Compare
pmachata
pushed a commit
to pmachata/linux_mlxsw
that referenced
this pull request
Dec 18, 2025
…stats Cited commit added a dedicated mutex (instead of RTNL) to protect the multicast route list, so that it will not change while the driver periodically traverses it in order to update the kernel about multicast route stats that were queried from the device. One instance of list entry deletion (during route replace) was missed and it can result in a use-after-free [1]. Fix by acquiring the mutex before deleting the entry from the list and releasing it afterwards. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] Read of size 8 at addr ffff8881523c2fa8 by task kworker/2:5/22043 CPU: 2 UID: 0 PID: 22043 Comm: kworker/2:5 Not tainted 6.18.0-rc1-custom-g1a3d6d7cd014 jpirko#1 PREEMPT(full) Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017 Workqueue: mlxsw_core mlxsw_sp_mr_stats_update [mlxsw_spectrum] Call Trace: <TASK> dump_stack_lvl+0xba/0x110 print_report+0x174/0x4f5 kasan_report+0xdf/0x110 mlxsw_sp_mr_stats_update+0x4a5/0x540 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:1006 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 mlxsw_sp_mr_route_add+0xd8/0x4770 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Freed by task 29933: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x43/0x70 kfree+0x14e/0x700 mlxsw_sp_mr_route_add+0x2dea/0x4770 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:444 [mlxsw_spectrum] mlxsw_sp_router_fibmr_event_work+0x371/0xad0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7965 [mlxsw_spectrum] process_one_work+0x9cc/0x18e0 worker_thread+0x5df/0xe40 kthread+0x3b8/0x730 ret_from_fork+0x3e9/0x560 ret_from_fork_asm+0x1a/0x30 Fixes: f38656d ("mlxsw: spectrum_mr: Protect multicast route list with a lock") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com>
81cdcfa to
45b663a
Compare
idosch
pushed a commit
that referenced
this pull request
Dec 21, 2025
A next patch will add support for transmitting XDP frames. There is a limitation to call XDP APIs when budget is zero. In Rx flow, we just skip processing for all packets. For Tx completion, driver should free the SKBs, but can not handle XDP completions. To be able to separate such handling, dedicate SDQ #1 for XDP frames. The driver already dedicates SDQ #0 for EMADs. The appropriate CQ is also dedicated for XDP, as it is dedicated for specific SDQ. Add an enum for SDQ reserved indexes, instead of MLXSW_PCI_SDQ_EMAD_INDEX macro. Derive minimum required SDQs from the new enum. Adjust the exiting code to use the enum and to take into account the reserved SDQ for XDP. Next patches will use this queue. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
45b663a to
70192d3
Compare
70192d3 to
9c29a74
Compare
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
With the RAPL PMU addition, there is a recursive locking when CPU online callback function calls rapl_package_add_pmu(). Here cpu_hotplug_lock is already acquired by cpuhp_thread_fun() and rapl_package_add_pmu() tries to acquire again. <4>[ 8.197433] ============================================ <4>[ 8.197437] WARNING: possible recursive locking detected <4>[ 8.197440] 6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 Not tainted <4>[ 8.197444] -------------------------------------------- <4>[ 8.197447] cpuhp/0/20 is trying to acquire lock: <4>[ 8.197450] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: rapl_package_add_pmu+0x37/0x370 [intel_rapl_common] <4>[ 8.197463] but task is already holding lock: <4>[ 8.197466] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x6d/0x290 <4>[ 8.197477] other info that might help us debug this: <4>[ 8.197480] Possible unsafe locking scenario: <4>[ 8.197483] CPU0 <4>[ 8.197485] ---- <4>[ 8.197487] lock(cpu_hotplug_lock); <4>[ 8.197490] lock(cpu_hotplug_lock); <4>[ 8.197493] *** DEADLOCK *** .. .. <4>[ 8.197542] __lock_acquire+0x146e/0x2790 <4>[ 8.197548] lock_acquire+0xc4/0x2c0 <4>[ 8.197550] ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common] <4>[ 8.197556] cpus_read_lock+0x41/0x110 <4>[ 8.197558] ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common] <4>[ 8.197561] rapl_package_add_pmu+0x37/0x370 [intel_rapl_common] <4>[ 8.197565] rapl_cpu_online+0x85/0x87 [intel_rapl_msr] <4>[ 8.197568] ? __pfx_rapl_cpu_online+0x10/0x10 [intel_rapl_msr] <4>[ 8.197570] cpuhp_invoke_callback+0x41f/0x6c0 <4>[ 8.197573] ? cpuhp_thread_fun+0x6d/0x290 <4>[ 8.197575] cpuhp_thread_fun+0x1e2/0x290 <4>[ 8.197578] ? smpboot_thread_fn+0x26/0x290 <4>[ 8.197581] smpboot_thread_fn+0x12f/0x290 <4>[ 8.197584] ? __pfx_smpboot_thread_fn+0x10/0x10 <4>[ 8.197586] kthread+0x11f/0x250 <4>[ 8.197589] ? __pfx_kthread+0x10/0x10 <4>[ 8.197592] ret_from_fork+0x344/0x3a0 <4>[ 8.197595] ? __pfx_kthread+0x10/0x10 <4>[ 8.197597] ret_from_fork_asm+0x1a/0x30 <4>[ 8.197604] </TASK> Fix this issue in the same way as rapl powercap package domain is added from the same CPU online callback by introducing another interface which doesn't call cpus_read_lock(). Add rapl_package_add_pmu_locked() and rapl_package_remove_pmu_locked() which don't call cpus_read_lock(). Fixes: 748d6ba ("powercap: intel_rapl: Enable MSR-based RAPL PMU support") Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com> Closes: https://lore.kernel.org/linux-pm/5427ede1-57a0-43d1-99f3-8ca4b0643e82@intel.com/T/#u Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Tested-by: RavitejaX Veesam <ravitejax.veesam@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20251217153455.3560176-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
Avoid a possible UAF in GPU recovery due to a race between the sched timeout callback and the tdr work queue. The gpu recovery function calls drm_sched_stop() and later drm_sched_start(). drm_sched_start() restarts the tdr queue which will eventually free the job. If the tdr queue frees the job before time out callback completes, the job will be freed and we'll get a UAF when accessing the pasid. Cache it early to avoid the UAF. Example KASAN trace: [ 493.058141] BUG: KASAN: slab-use-after-free in amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.067530] Read of size 4 at addr ffff88b0ce3f794c by task kworker/u128:1/323 [ 493.074892] [ 493.076485] CPU: 9 UID: 0 PID: 323 Comm: kworker/u128:1 Tainted: G E 6.16.0-1289896.2.zuul.bf4f11df81c1410bbe901c4373305a31 #1 PREEMPT(voluntary) [ 493.076493] Tainted: [E]=UNSIGNED_MODULE [ 493.076495] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 493.076500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] [ 493.076512] Call Trace: [ 493.076515] <TASK> [ 493.076518] dump_stack_lvl+0x64/0x80 [ 493.076529] print_report+0xce/0x630 [ 493.076536] ? _raw_spin_lock_irqsave+0x86/0xd0 [ 493.076541] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 493.076545] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077253] kasan_report+0xb8/0xf0 [ 493.077258] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077965] amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.078672] ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu] [ 493.079378] ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu] [ 493.080111] amdgpu_job_timedout+0x642/0x1400 [amdgpu] [ 493.080903] ? pick_task_fair+0x24e/0x330 [ 493.080910] ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu] [ 493.081702] ? _raw_spin_lock+0x75/0xc0 [ 493.081708] ? __pfx__raw_spin_lock+0x10/0x10 [ 493.081712] drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched] [ 493.081721] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081725] process_one_work+0x679/0xff0 [ 493.081732] worker_thread+0x6ce/0xfd0 [ 493.081736] ? __pfx_worker_thread+0x10/0x10 [ 493.081739] kthread+0x376/0x730 [ 493.081744] ? __pfx_kthread+0x10/0x10 [ 493.081748] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081751] ? __pfx_kthread+0x10/0x10 [ 493.081755] ret_from_fork+0x247/0x330 [ 493.081761] ? __pfx_kthread+0x10/0x10 [ 493.081764] ret_from_fork_asm+0x1a/0x30 [ 493.081771] </TASK> Fixes: a72002c ("drm/amdgpu: Make use of drm_wedge_task_info") Link: HansKristian-Work/vkd3d-proton#2670 Cc: SRINIVASAN.SHANMUGAM@amd.com Cc: vitaly.prosyak@amd.com Cc: christian.koenig@amd.com Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 20880a3fd5dd7bca1a079534cf6596bda92e107d)
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
A race condition was found in sg_proc_debug_helper(). It was observed on a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring /proc/scsi/sg/debug every second. A very large elapsed time would sometimes appear. This is caused by two race conditions. We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64 architecture. A patched kernel was built, and the race condition could not be observed anymore after the application of this patch. A reproducer C program utilising the scsi_debug module was also built by Changhui Zhong and can be viewed here: https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c The first race happens between the reading of hp->duration in sg_proc_debug_helper() and request completion in sg_rq_end_io(). The hp->duration member variable may hold either of two types of information: #1 - The start time of the request. This value is present while the request is not yet finished. #2 - The total execution time of the request (end_time - start_time). If sg_proc_debug_helper() executes *after* the value of hp->duration was changed from #1 to #2, but *before* srp->done is set to 1 in sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the elapsed time (value type #2) is subtracted from a timestamp, which cannot yield a valid elapsed time (which is a type #2 value as well). To fix this issue, the value of hp->duration must change under the protection of the sfp->rq_list_lock in sg_rq_end_io(). Since sg_proc_debug_helper() takes this read lock, the change to srp->done and srp->header.duration will happen atomically from the perspective of sg_proc_debug_helper() and the race condition is thus eliminated. The second race condition happens between sg_proc_debug_helper() and sg_new_write(). Even though hp->duration is set to the current time stamp in sg_add_request() under the write lock's protection, it gets overwritten by a call to get_sg_io_hdr(), which calls copy_from_user() to copy struct sg_io_hdr from userspace into kernel space. hp->duration is set to the start time again in sg_common_write(). If sg_proc_debug_helper() is called between these two calls, an arbitrary value set by userspace (usually zero) is used to compute the elapsed time. To fix this issue, hp->duration must be set to the current timestamp again after get_sg_io_hdr() returns successfully. A small race window still exists between get_sg_io_hdr() and setting hp->duration, but this window is only a few instructions wide and does not result in observable issues in practice, as confirmed by testing. Additionally, we fix the format specifier from %d to %u for printing unsigned int values in sg_proc_debug_helper(). Signed-off-by: Michal Rábek <mrabek@redhat.com> Suggested-by: Tomas Henzl <thenzl@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
When a page is freed it coalesces with a buddy into a higher order page
while possible. When the buddy page migrate type differs, it is expected
to be updated to match the one of the page being freed.
However, only the first pageblock of the buddy page is updated, while the
rest of the pageblocks are left unchanged.
That causes warnings in later expand() and other code paths (like below),
since an inconsistency between migration type of the list containing the
page and the page-owned pageblocks migration types is introduced.
[ 308.986589] ------------[ cut here ]------------
[ 308.987227] page type is 0, passed migratetype is 1 (nr=256)
[ 308.987275] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:812 expand+0x23c/0x270
[ 308.987293] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[ 308.987439] Unloaded tainted modules: hmac_s390(E):2
[ 308.987650] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G E 6.18.0-gcc-bpf-debug #431 PREEMPT
[ 308.987657] Tainted: [E]=UNSIGNED_MODULE
[ 308.987661] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[ 308.987666] Krnl PSW : 0404f00180000000 00000349976fa600 (expand+0x240/0x270)
[ 308.987676] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[ 308.987682] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[ 308.987688] 0000000000000005 0000034980000005 000002be803ac000 0000023efe6c8300
[ 308.987692] 0000000000000008 0000034998d57290 000002be00000100 0000023e00000008
[ 308.987696] 0000000000000000 0000000000000000 00000349976fa5fc 000002c99b1eb6f0
[ 308.987708] Krnl Code: 00000349976fa5f0: c020008a02f2 larl %r2,000003499883abd4
00000349976fa5f6: c0e5ffe3f4b5 brasl %r14,0000034997378f60
#00000349976fa5fc: af000000 mc 0,0
>00000349976fa600: a7f4ff4c brc 15,00000349976fa498
00000349976fa604: b9040026 lgr %r2,%r6
00000349976fa608: c0300088317f larl %r3,0000034998800906
00000349976fa60e: c0e5fffdb6e1 brasl %r14,00000349976b13d0
00000349976fa614: af000000 mc 0,0
[ 308.987734] Call Trace:
[ 308.987738] [<00000349976fa600>] expand+0x240/0x270
[ 308.987744] ([<00000349976fa5fc>] expand+0x23c/0x270)
[ 308.987749] [<00000349976ff95e>] rmqueue_bulk+0x71e/0x940
[ 308.987754] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[ 308.987759] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[ 308.987763] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[ 308.987768] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[ 308.987774] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[ 308.987781] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[ 308.987786] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[ 308.987791] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[ 308.987799] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[ 308.987804] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[ 308.987809] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[ 308.987813] [<000003499734d70e>] do_exception+0x1de/0x540
[ 308.987822] [<0000034998387390>] __do_pgm_check+0x130/0x220
[ 308.987830] [<000003499839a934>] pgm_check_handler+0x114/0x160
[ 308.987838] 3 locks held by mempig_verify/5224:
[ 308.987842] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[ 308.987859] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[ 308.987871] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[ 308.987886] Last Breaking-Event-Address:
[ 308.987890] [<0000034997379096>] __warn_printk+0x136/0x140
[ 308.987897] irq event stamp: 52330356
[ 308.987901] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[ 308.987907] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[ 308.987913] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[ 308.987922] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[ 308.987929] ---[ end trace 0000000000000000 ]---
[ 308.987936] ------------[ cut here ]------------
[ 308.987940] page type is 0, passed migratetype is 1 (nr=256)
[ 308.987951] WARNING: CPU: 1 PID: 5224 at mm/page_alloc.c:860 __del_page_from_free_list+0x1be/0x1e0
[ 308.987960] Modules linked in: algif_hash(E) af_alg(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nf_tables(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) loop(E) nfnetlink(E) vsock_loopback(E) vmw_vsock_virtio_transport_common(E) vsock(E) ctcm(E) fsm(E) diag288_wdt(E) watchdog(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) aes_s390(E) des_generic(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha_common(E) paes_s390(E) crypto_engine(E) pkey_cca(E) pkey_ep11(E) zcrypt(E) rng_core(E) pkey_pckmo(E) pkey(E) autofs4(E)
[ 308.988070] Unloaded tainted modules: hmac_s390(E):2
[ 308.988087] CPU: 1 UID: 0 PID: 5224 Comm: mempig_verify Kdump: loaded Tainted: G W E 6.18.0-gcc-bpf-debug #431 PREEMPT
[ 308.988095] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[ 308.988100] Hardware name: IBM 3906 M04 704 (z/VM 7.3.0)
[ 308.988105] Krnl PSW : 0404f00180000000 00000349976f9e32 (__del_page_from_free_list+0x1c2/0x1e0)
[ 308.988118] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
[ 308.988127] Krnl GPRS: 0000034980000004 0000000000000005 0000000000000030 000003499a0e6d88
[ 308.988133] 0000000000000005 0000034980000005 0000034998d57290 0000023efe6c8300
[ 308.988139] 0000000000000001 0000000000000008 000002be00000100 000002be803ac000
[ 308.988144] 0000000000000000 0000000000000001 00000349976f9e2e 000002c99b1eb728
[ 308.988153] Krnl Code: 00000349976f9e22: c020008a06d9 larl %r2,000003499883abd4
00000349976f9e28: c0e5ffe3f89c brasl %r14,0000034997378f60
#00000349976f9e2e: af000000 mc 0,0
>00000349976f9e32: a7f4ff4e brc 15,00000349976f9cce
00000349976f9e36: b904002b lgr %r2,%r11
00000349976f9e3a: c030008a06e7 larl %r3,000003499883ac08
00000349976f9e40: c0e5fffdbac8 brasl %r14,00000349976b13d0
00000349976f9e46: af000000 mc 0,0
[ 308.988184] Call Trace:
[ 308.988188] [<00000349976f9e32>] __del_page_from_free_list+0x1c2/0x1e0
[ 308.988195] ([<00000349976f9e2e>] __del_page_from_free_list+0x1be/0x1e0)
[ 308.988202] [<00000349976ff946>] rmqueue_bulk+0x706/0x940
[ 308.988208] [<00000349976ffd7e>] __rmqueue_pcplist+0x1fe/0x2a0
[ 308.988214] [<0000034997700966>] rmqueue.isra.0+0xb46/0xf40
[ 308.988221] [<0000034997703ec8>] get_page_from_freelist+0x198/0x8d0
[ 308.988227] [<0000034997706fa8>] __alloc_frozen_pages_noprof+0x198/0x400
[ 308.988233] [<00000349977536f8>] alloc_pages_mpol+0xb8/0x220
[ 308.988240] [<0000034997753bf6>] folio_alloc_mpol_noprof+0x26/0xc0
[ 308.988247] [<0000034997753e4c>] vma_alloc_folio_noprof+0x6c/0xa0
[ 308.988253] [<0000034997775b22>] vma_alloc_anon_folio_pmd+0x42/0x240
[ 308.988260] [<000003499777bfea>] __do_huge_pmd_anonymous_page+0x3a/0x210
[ 308.988267] [<00000349976cb08e>] __handle_mm_fault+0x4de/0x500
[ 308.988273] [<00000349976cb14c>] handle_mm_fault+0x9c/0x3a0
[ 308.988279] [<000003499734d70e>] do_exception+0x1de/0x540
[ 308.988286] [<0000034998387390>] __do_pgm_check+0x130/0x220
[ 308.988293] [<000003499839a934>] pgm_check_handler+0x114/0x160
[ 308.988300] 3 locks held by mempig_verify/5224:
[ 308.988305] #0: 0000023ea44c1e08 (vm_lock){++++}-{0:0}, at: lock_vma_under_rcu+0xb2/0x2a0
[ 308.988322] #1: 0000023ee4d41b18 (&pcp->lock){+.+.}-{2:2}, at: rmqueue.isra.0+0xad6/0xf40
[ 308.988334] #2: 0000023efe6c8998 (&zone->lock){..-.}-{2:2}, at: rmqueue_bulk+0x5a/0x940
[ 308.988346] Last Breaking-Event-Address:
[ 308.988350] [<0000034997379096>] __warn_printk+0x136/0x140
[ 308.988356] irq event stamp: 52330356
[ 308.988360] hardirqs last enabled at (52330355): [<000003499838742e>] __do_pgm_check+0x1ce/0x220
[ 308.988366] hardirqs last disabled at (52330356): [<000003499839932e>] _raw_spin_lock_irqsave+0x9e/0xe0
[ 308.988373] softirqs last enabled at (52329882): [<0000034997383786>] handle_softirqs+0x2c6/0x530
[ 308.988380] softirqs last disabled at (52329859): [<0000034997382f86>] __irq_exit_rcu+0x126/0x140
[ 308.988388] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20251215081002.3353900A9c-agordeev@linux.ibm.com
Link: https://lkml.kernel.org/r/20251212151457.3898073Add-agordeev@linux.ibm.com
Fixes: e6cf9e1 ("mm: page_alloc: fix up block types when merging compatible blocks")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Closes: https://lore.kernel.org/linux-mm/87wmalyktd.fsf@linux.ibm.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Marc Hartmayer <mhartmay@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
A null pointer dereference in handshake_complete() was observed [1]. When handshake_req_next() return NULL in handshake_nl_accept_doit(), function handshake_complete() will be called unexpectedly which triggers this crash. Fix it by goto out_status when req is NULL. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI RIP: 0010:handshake_complete+0x36/0x2b0 net/handshake/request.c:288 Call Trace: <TASK> handshake_nl_accept_doit+0x32d/0x7e0 net/handshake/netlink.c:129 genl_family_rcv_msg_doit+0x204/0x300 net/netlink/genetlink.c:1115 genl_family_rcv_msg+0x436/0x670 net/netlink/genetlink.c:1195 genl_rcv_msg+0xcc/0x170 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x14c/0x430 net/netlink/af_netlink.c:2550 genl_rcv+0x2d/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x878/0xb20 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x897/0xd70 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa39/0xbf0 net/socket.c:2592 ___sys_sendmsg+0x121/0x1c0 net/socket.c:2646 __sys_sendmsg+0x155/0x200 net/socket.c:2678 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x5f/0x350 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> Fixes: fe67b06 ("net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/kernel-tls-handshake/aScekpuOYHRM9uOd@morisot.1015granger.net/T/#m7cfa5c11efc626d77622b2981591197a2acdd65e Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251212012723.4111831-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
…nged() There has been a syzkaller bug reported recently with the following trace: list_del corruption, ffff888058bea080->prev is LIST_POISON2 (dead000000000122) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:59! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 3 UID: 0 PID: 21246 Comm: syz.0.2928 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:__list_del_entry_valid_or_report+0x13e/0x200 lib/list_debug.c:59 Code: 48 c7 c7 e0 71 f0 8b e8 30 08 ef fc 90 0f 0b 48 89 ef e8 a5 02 55 fd 48 89 ea 48 89 de 48 c7 c7 40 72 f0 8b e8 13 08 ef fc 90 <0f> 0b 48 89 ef e8 88 02 55 fd 48 89 ea 48 b8 00 00 00 00 00 fc ff RSP: 0018:ffffc9000d49f370 EFLAGS: 00010286 RAX: 000000000000004e RBX: ffff888058bea080 RCX: ffffc9002817d000 RDX: 0000000000000000 RSI: ffffffff819becc6 RDI: 0000000000000005 RBP: dead000000000122 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000001 R12: ffff888039e9c230 R13: ffff888058bea088 R14: ffff888058bea080 R15: ffff888055461480 FS: 00007fbbcfe6f6c0(0000) GS:ffff8880d6d0a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c3afcb0 CR3: 00000000382c7000 CR4: 0000000000352ef0 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:132 [inline] __list_del_entry include/linux/list.h:223 [inline] list_del_rcu include/linux/rculist.h:178 [inline] __team_queue_override_port_del drivers/net/team/team_core.c:826 [inline] __team_queue_override_port_del drivers/net/team/team_core.c:821 [inline] team_queue_override_port_prio_changed drivers/net/team/team_core.c:883 [inline] team_priority_option_set+0x171/0x2f0 drivers/net/team/team_core.c:1534 team_option_set drivers/net/team/team_core.c:376 [inline] team_nl_options_set_doit+0x8ae/0xe60 drivers/net/team/team_core.c:2653 genl_family_rcv_msg_doit+0x209/0x2f0 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa98/0xc70 net/socket.c:2630 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2684 __sys_sendmsg+0x16d/0x220 net/socket.c:2716 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The problem is in this flow: 1) Port is enabled, queue_id != 0, in qom_list 2) Port gets disabled -> team_port_disable() -> team_queue_override_port_del() -> del (removed from list) 3) Port is disabled, queue_id != 0, not in any list 4) Priority changes -> team_queue_override_port_prio_changed() -> checks: port disabled && queue_id != 0 -> calls del - hits the BUG as it is removed already To fix this, change the check in team_queue_override_port_prio_changed() so it returns early if port is not enabled. Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f Fixes: 6c31ff3 ("team: remove synchronize_rcu() called during queue override change") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251212102953.167287-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
After the blamed commit below, if the MPC subflow is already in TCP_CLOSE status or has fallback to TCP at mptcp_disconnect() time, mptcp_do_fastclose() skips setting the `send_fastclose flag` and the later __mptcp_close_ssk() does not reset anymore the related subflow context. Any later connection will be created with both the `request_mptcp` flag and the msk-level fallback status off (it is unconditionally cleared at MPTCP disconnect time), leading to a warning in subflow_data_ready(): WARNING: CPU: 26 PID: 8996 at net/mptcp/subflow.c:1519 subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13)) Modules linked in: CPU: 26 UID: 0 PID: 8996 Comm: syz.22.39 Not tainted 6.18.0-rc7-05427-g11fc074f6c36 #1 PREEMPT(voluntary) Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13)) Code: 90 0f 0b 90 90 e9 04 fe ff ff e8 b7 1e f5 fe 89 ee bf 07 00 00 00 e8 db 19 f5 fe 83 fd 07 0f 84 35 ff ff ff e8 9d 1e f5 fe 90 <0f> 0b 90 e9 27 ff ff ff e8 8f 1e f5 fe 4c 89 e7 48 89 de e8 14 09 RSP: 0018:ffffc9002646fb30 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88813b218000 RCX: ffffffff825c8435 RDX: ffff8881300b3580 RSI: ffffffff825c8443 RDI: 0000000000000005 RBP: 000000000000000b R08: ffffffff825c8435 R09: 000000000000000b R10: 0000000000000005 R11: 0000000000000007 R12: ffff888131ac0000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f88330af6c0(0000) GS:ffff888a93dd2000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f88330aefe8 CR3: 000000010ff59000 CR4: 0000000000350ef0 Call Trace: <TASK> tcp_data_ready (net/ipv4/tcp_input.c:5356) tcp_data_queue (net/ipv4/tcp_input.c:5445) tcp_rcv_state_process (net/ipv4/tcp_input.c:7165) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1955) __release_sock (include/net/sock.h:1158 (discriminator 6) net/core/sock.c:3180 (discriminator 6)) release_sock (net/core/sock.c:3737) mptcp_sendmsg (net/mptcp/protocol.c:1763 net/mptcp/protocol.c:1857) inet_sendmsg (net/ipv4/af_inet.c:853 (discriminator 7)) __sys_sendto (net/socket.c:727 (discriminator 15) net/socket.c:742 (discriminator 15) net/socket.c:2244 (discriminator 15)) __x64_sys_sendto (net/socket.c:2247) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f883326702d Address the issue setting an explicit `fastclosing` flag at fastclose time, and checking such flag after mptcp_do_fastclose(). Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-2-d1f9fd1c36c8@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
In e1000_tbi_should_accept() we read the last byte of the frame via 'data[length - 1]' to evaluate the TBI workaround. If the descriptor- reported length is zero or larger than the actual RX buffer size, this read goes out of bounds and can hit unrelated slab objects. The issue is observed from the NAPI receive path (e1000_clean_rx_irq): ================================================================== BUG: KASAN: slab-out-of-bounds in e1000_tbi_should_accept+0x610/0x790 Read of size 1 at addr ffff888014114e54 by task sshd/363 CPU: 0 PID: 363 Comm: sshd Not tainted 5.18.0-rc1 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x5a/0x74 print_address_description+0x7b/0x440 print_report+0x101/0x200 kasan_report+0xc1/0xf0 e1000_tbi_should_accept+0x610/0x790 e1000_clean_rx_irq+0xa8c/0x1110 e1000_clean+0xde2/0x3c10 __napi_poll+0x98/0x380 net_rx_action+0x491/0xa20 __do_softirq+0x2c9/0x61d do_softirq+0xd1/0x120 </IRQ> <TASK> __local_bh_enable_ip+0xfe/0x130 ip_finish_output2+0x7d5/0xb00 __ip_queue_xmit+0xe24/0x1ab0 __tcp_transmit_skb+0x1bcb/0x3340 tcp_write_xmit+0x175d/0x6bd0 __tcp_push_pending_frames+0x7b/0x280 tcp_sendmsg_locked+0x2e4f/0x32d0 tcp_sendmsg+0x24/0x40 sock_write_iter+0x322/0x430 vfs_write+0x56c/0xa60 ksys_write+0xd1/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f511b476b10 Code: 73 01 c3 48 8b 0d 88 d3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d f9 2b 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e 9b 01 00 48 89 04 24 RSP: 002b:00007ffc9211d4e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000004024 RCX: 00007f511b476b10 RDX: 0000000000004024 RSI: 0000559a9385962c RDI: 0000000000000003 RBP: 0000559a9383a400 R08: fffffffffffffff0 R09: 0000000000004f00 R10: 0000000000000070 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc9211d57f R14: 0000559a9347bde7 R15: 0000000000000003 </TASK> Allocated by task 1: __kasan_krealloc+0x131/0x1c0 krealloc+0x90/0xc0 add_sysfs_param+0xcb/0x8a0 kernel_add_sysfs_param+0x81/0xd4 param_sysfs_builtin+0x138/0x1a6 param_sysfs_init+0x57/0x5b do_one_initcall+0x104/0x250 do_initcall_level+0x102/0x132 do_initcalls+0x46/0x74 kernel_init_freeable+0x28f/0x393 kernel_init+0x14/0x1a0 ret_from_fork+0x22/0x30 The buggy address belongs to the object at ffff888014114000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 1620 bytes to the right of 2048-byte region [ffff888014114000, ffff888014114800] The buggy address belongs to the physical page: page:ffffea0000504400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14110 head:ffffea0000504400 order:3 compound_mapcount:0 compound_pincount:0 flags: 0x100000000010200(slab|head|node=0|zone=1) raw: 0100000000010200 0000000000000000 dead000000000001 ffff888013442000 raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected ================================================================== This happens because the TBI check unconditionally dereferences the last byte without validating the reported length first: u8 last_byte = *(data + length - 1); Fix by rejecting the frame early if the length is zero, or if it exceeds adapter->rx_buffer_len. This preserves the TBI workaround semantics for valid frames and prevents touching memory beyond the RX buffer. Fixes: 2037110 ("e1000: move tbi workaround code into helper function") Cc: stable@vger.kernel.org Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan(). mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can invoke the driver's remove path and free the gdma_context associated with the device. After returning, mana_serv_reset() currently jumps to the out label and attempts to clear gc->in_service, dereferencing a freed gdma_context. The issue was observed with the following call logs: [ 698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d [ 698.943121] #PF: supervisor write access in kernel mode [ 698.943423] #PF: error_code(0x0002) - not-present page [S[ 698.943793] Pat Dec 6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0 0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI tvsc f8615163-00[ 698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1 ... [Sat Dec 6 07:50:56 2025] R10: [ 699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16 000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000 [Sat Dec 6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405 [Sat Dec 6 07:50:56 2025] FS: 0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000 [Sat Dec 6 07:50:56 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Sat Dec 6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0 [Sat Dec 6 07:50:56 2025] Call Trace: [Sat Dec 6 07:50:56 2025] <TASK> [Sat Dec 6 07:50:56 2025] mana_serv_func+0x24/0x50 [mana] [Sat Dec 6 07:50:56 2025] process_one_work+0x190/0x350 [Sat Dec 6 07:50:56 2025] worker_thread+0x2b7/0x3d0 [Sat Dec 6 07:50:56 2025] kthread+0xf3/0x200 [Sat Dec 6 07:50:56 2025] ? __pfx_worker_thread+0x10/0x10 [Sat Dec 6 07:50:56 2025] ? __pfx_kthread+0x10/0x10 [Sat Dec 6 07:50:56 2025] ret_from_fork+0x21a/0x250 [Sat Dec 6 07:50:56 2025] ? __pfx_kthread+0x10/0x10 [Sat Dec 6 07:50:56 2025] ret_from_fork_asm+0x1a/0x30 [Sat Dec 6 07:50:56 2025] </TASK> Fix this by returning immediately after mana_serv_rescan() to avoid accessing GC state that may no longer be valid. Fixes: 9bf6603 ("net: mana: Handle hardware recovery events when probing the device") Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20251218131054.GA3173@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
There is a crash issue when running zero copy XDP_TX action, the crash log is shown below. [ 216.122464] Unable to handle kernel paging request at virtual address fffeffff80000000 [ 216.187524] Internal error: Oops: 0000000096000144 [#1] SMP [ 216.301694] Call trace: [ 216.304130] dcache_clean_poc+0x20/0x38 (P) [ 216.308308] __dma_sync_single_for_device+0x1bc/0x1e0 [ 216.313351] stmmac_xdp_xmit_xdpf+0x354/0x400 [ 216.317701] __stmmac_xdp_run_prog+0x164/0x368 [ 216.322139] stmmac_napi_poll_rxtx+0xba8/0xf00 [ 216.326576] __napi_poll+0x40/0x218 [ 216.408054] Kernel panic - not syncing: Oops: Fatal exception in interrupt For XDP_TX action, the xdp_buff is converted to xdp_frame by xdp_convert_buff_to_frame(). The memory type of the resulting xdp_frame depends on the memory type of the xdp_buff. For page pool based xdp_buff it produces xdp_frame with memory type MEM_TYPE_PAGE_POOL. For zero copy XSK pool based xdp_buff it produces xdp_frame with memory type MEM_TYPE_PAGE_ORDER0. However, stmmac_xdp_xmit_back() does not check the memory type and always uses the page pool type, this leads to invalid mappings and causes the crash. Therefore, check the xdp_buff memory type in stmmac_xdp_xmit_back() to fix this issue. Fixes: bba2556 ("net: stmmac: Enable RX via AF_XDP zero-copy") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20251204071332.1907111-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
syzbot reported a crash [1] in dql_completed() after recent usbnet BQL adoption. The reason for the crash is that netdev_reset_queue() is called too soon. It should be called after cancel_work_sync(&dev->bh_work) to make sure no more TX completion can happen. [1] kernel BUG at lib/dynamic_queue_limits.c:99 ! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 5197 Comm: udevd Tainted: G L syzkaller #0 PREEMPT(full) Tainted: [L]=SOFTLOCKUP Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:dql_completed+0xbe1/0xbf0 lib/dynamic_queue_limits.c:99 Call Trace: <IRQ> netdev_tx_completed_queue include/linux/netdevice.h:3864 [inline] netdev_completed_queue include/linux/netdevice.h:3894 [inline] usbnet_bh+0x793/0x1020 drivers/net/usb/usbnet.c:1601 process_one_work kernel/workqueue.c:3257 [inline] process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340 bh_worker+0x2b1/0x600 kernel/workqueue.c:3611 tasklet_action+0xc/0x70 kernel/softirq.c:952 handle_softirqs+0x27d/0x850 kernel/softirq.c:622 __do_softirq kernel/softirq.c:656 [inline] invoke_softirq kernel/softirq.c:496 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:723 irq_exit_rcu+0x9/0x30 kernel/softirq.c:739 Fixes: 7ff14c5 ("usbnet: Add support for Byte Queue Limits (BQL)") Reported-by: syzbot+5b55e49f8bbd84631a9c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6945644f.a70a0220.207337.0113.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Simon Schippers <simon.schippers@tu-dortmund.de> Link: https://patch.msgid.link/20251219144459.692715-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
idosch
pushed a commit
that referenced
this pull request
Dec 31, 2025
… to macb_open()
In the non-RT kernel, local_bh_disable() merely disables preemption,
whereas it maps to an actual spin lock in the RT kernel. Consequently,
when attempting to refill RX buffers via netdev_alloc_skb() in
macb_mac_link_up(), a deadlock scenario arises as follows:
WARNING: possible circular locking dependency detected
6.18.0-08691-g2061f18ad76e #39 Not tainted
------------------------------------------------------
kworker/0:0/8 is trying to acquire lock:
ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c
but task is already holding lock:
ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit
+0x148/0xb7c
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&queue->tx_ptr_lock){+...}-{3:3}:
rt_spin_lock+0x50/0x1f0
macb_start_xmit+0x148/0xb7c
dev_hard_start_xmit+0x94/0x284
sch_direct_xmit+0x8c/0x37c
__dev_queue_xmit+0x708/0x1120
neigh_resolve_output+0x148/0x28c
ip6_finish_output2+0x2c0/0xb2c
__ip6_finish_output+0x114/0x308
ip6_output+0xc4/0x4a4
mld_sendpack+0x220/0x68c
mld_ifc_work+0x2a8/0x4f4
process_one_work+0x20c/0x5f8
worker_thread+0x1b0/0x35c
kthread+0x144/0x200
ret_from_fork+0x10/0x20
-> #2 (_xmit_ETHER#2){+...}-{3:3}:
rt_spin_lock+0x50/0x1f0
sch_direct_xmit+0x11c/0x37c
__dev_queue_xmit+0x708/0x1120
neigh_resolve_output+0x148/0x28c
ip6_finish_output2+0x2c0/0xb2c
__ip6_finish_output+0x114/0x308
ip6_output+0xc4/0x4a4
mld_sendpack+0x220/0x68c
mld_ifc_work+0x2a8/0x4f4
process_one_work+0x20c/0x5f8
worker_thread+0x1b0/0x35c
kthread+0x144/0x200
ret_from_fork+0x10/0x20
-> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}:
lock_release+0x250/0x348
__local_bh_enable_ip+0x7c/0x240
__netdev_alloc_skb+0x1b4/0x1d8
gem_rx_refill+0xdc/0x240
gem_init_rings+0xb4/0x108
macb_mac_link_up+0x9c/0x2b4
phylink_resolve+0x170/0x614
process_one_work+0x20c/0x5f8
worker_thread+0x1b0/0x35c
kthread+0x144/0x200
ret_from_fork+0x10/0x20
-> #0 (&bp->lock){+.+.}-{3:3}:
__lock_acquire+0x15a8/0x2084
lock_acquire+0x1cc/0x350
rt_spin_lock+0x50/0x1f0
macb_start_xmit+0x808/0xb7c
dev_hard_start_xmit+0x94/0x284
sch_direct_xmit+0x8c/0x37c
__dev_queue_xmit+0x708/0x1120
neigh_resolve_output+0x148/0x28c
ip6_finish_output2+0x2c0/0xb2c
__ip6_finish_output+0x114/0x308
ip6_output+0xc4/0x4a4
mld_sendpack+0x220/0x68c
mld_ifc_work+0x2a8/0x4f4
process_one_work+0x20c/0x5f8
worker_thread+0x1b0/0x35c
kthread+0x144/0x200
ret_from_fork+0x10/0x20
other info that might help us debug this:
Chain exists of:
&bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&queue->tx_ptr_lock);
lock(_xmit_ETHER#2);
lock(&queue->tx_ptr_lock);
lock(&bp->lock);
*** DEADLOCK ***
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0xa0/0xf0
dump_stack+0x18/0x24
print_circular_bug+0x28c/0x370
check_noncircular+0x198/0x1ac
__lock_acquire+0x15a8/0x2084
lock_acquire+0x1cc/0x350
rt_spin_lock+0x50/0x1f0
macb_start_xmit+0x808/0xb7c
dev_hard_start_xmit+0x94/0x284
sch_direct_xmit+0x8c/0x37c
__dev_queue_xmit+0x708/0x1120
neigh_resolve_output+0x148/0x28c
ip6_finish_output2+0x2c0/0xb2c
__ip6_finish_output+0x114/0x308
ip6_output+0xc4/0x4a4
mld_sendpack+0x220/0x68c
mld_ifc_work+0x2a8/0x4f4
process_one_work+0x20c/0x5f8
worker_thread+0x1b0/0x35c
kthread+0x144/0x200
ret_from_fork+0x10/0x20
Notably, invoking the mog_init_rings() callback upon link establishment
is unnecessary. Instead, we can exclusively call mog_init_rings() within
the ndo_open() callback. This adjustment resolves the deadlock issue.
Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings()
when opening the network interface via at91ether_open(), moving
mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC
check.
Fixes: 633e98a ("net: macb: use resolved link config in mac_link_up()")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Hao <kexin.hao@windriver.com>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://patch.msgid.link/20251222015624.1994551-1-xiaolei.wang@windriver.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2dc192c to
a46794f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bumps pip from 23.2.1 to 23.3.
Changelog
Sourced from pip's changelog.
... (truncated)
Commits
e3dc91dBump for release3e85558Update AUTHORS.txt8d02787Reclassify news fragmentf6ecf40Merge pull request #12350 from sbidoul/readact-collecting-url3060865Merge pull request #12335 from edmorley/patch-18f0ed32Redact URLs in Collecting... logsd1659b8Correct issue number for NEWS entry added by #121972333ef3Upgrade urllib3 to 1.26.17 (#12343)496b268Update "Running Tests" documentation (#12334)d1f0981Merge pull request #12331 from sbidoul/update-egg-deprecation-messageDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)You can disable automated security fix PRs for this repo from the Security Alerts page.